Introduction
Introduction
Housekeeping, introductions
The views expressed in this presentation are those of the authors and do not necessarily represent the views or policies of the U.S. Environmental Protection Agency or the U.S.G.S.
This land is ancestral and unceded land of the Chepenafa (Mary’s River) Band of the Kalapuya (pronounce “cal-uh-poo-yuh”), who have stewarded this beautiful area for generations. The descendants of the Kalapuya People are now part of the federally recognized Confederated Tribes of Grand Ronde and Confederated Tribes of the Siletz of Oregon. The peak you see from numerous vantage points around Corvallis is Mary’s Peak, which the Kalapuya called tcha Timanwi, or ‘place of spiritual power’, and is the highest peak in the Oregon Coast Range (4,097 feet).
Key Concepts Across Both languages
Core Libraries
- Spatial data structures across languages and applications is primarily organized through OSgeo and OGC) and a few core libraries underpin spatial libraries in programming languages and in software applications(R, Python, QGIS, ArcPro)!
These libraries include:
PROJ –> Spatial projections, transformations
GEOS –> Geometry operations (measures, relations)
GDAL –> Raster and feature abstraction and processing (read, write)
NetCDF –> Multidimensional (XYZT) data abstraction (read, write)
Requesting web services (APIs)
More and more data is available as web services, and the data is stored in a remote server in JSON, XML, HTML format which is accessible through API’s.
JSON (JavaScript Object Notation) is a lightweight data-interchange format which is easy for machines to generate and parse; easy for humans to write and read.
Sample JSON format : { “data”: “Click Here”, “size”: 36, “style”: “bold”, “name”: “text1”, }
API - An Application Programming Interface (API) takes a structured request from an application and returns structured results from the host application. With a Rest API you’re getting a representation of the requested data stored in a server. A Rest API also is what we call ‘stateless’, which means a server doesn’t store any data between requests from clients.
Rest APIs access data through uniform resource identifiers (URIs), which are essentially a string of characters that identify a specific resource. The type of URI used by a Rest API is a uniform resource locator (URL).
HTTP clients are used for accessing the API. HyperText Transfer Protocol (HTTP) enables communication between the client and server using HTTP methods. If we want to access or manipulate resources a Rest API uses specific request verbs we need to become familiar with:
GET: used to acquire data from a database
POST: used to add data to a database
PUT: update the data in a database
DELETE: delete data in a database
Geospatial data representation fundamentals: simple features and geospatial grids
*Vector data are comprised of points, lines, and polygons that represent discrete spatial entities, such as a river, watershed, or stream gauge.
Raster data divides spaces into rectilinear cells (pixels) to represent spatially continuous phenomena, such as elevation or the weather. The cell size (or resolution) defines the fidelity of the data.
For Vector data Simple Features (officially Simple Feature Access) is both an OGC and International Organization for Standardization (ISO) standard that specifies how (mostly) two-dimensional geometries can represent and describe objects in the real world. Simple features includes:
- a class hierarchy
- a set of operations
- binary and text encodings
It describes how such objects can be stored in and retrieved from databases, and which geometrical operations should be defined for them.
It outlines how the spatial elements of POINTS (XY locations with a specific coordinate reference system) extend to LINES, POLYGONS and GEOMETRYCOLLECTION(s).
The “simple” adjective also refers to the fact that the line or polygon geometries are represented by sequences of points connected with straight lines that do not self-intersect.
Simple and valid geometries and ring direction
This breakdown of simple features follows for the most part this section in Spatial Data Science For linestrings to be considered simple they must not self-intersect:
Code
library(sf)Linking to GEOS 3.11.1, GDAL 3.6.2, PROJ 9.1.1; sf_use_s2() is TRUE
Code
(ls <- st_linestring(rbind(c(0,0), c(1,1), c(2,2), c(0,2), c(1,1), c(2,0))))LINESTRING (0 0, 1 1, 2 2, 0 2, 1 1, 2 0)
is_simple
FALSE
For polygons several other conditions have to be met to be simple:
- polygon rings are closed (the last point equals the first)
- polygon holes (inner rings) are inside their exterior ring
- polygon inner rings maximally touch the exterior ring in single points, not over a line
- a polygon ring does not repeat its own path
- in a multi-polygon, an external ring maximally touches another exterior ring in single points, not over a line
z and m coordinates As well as having the necessary X and Y coordinates, single point (vertex) simple features can have:
- a Z coordinate, denoting altitude, and/or
- an M value, denoting some “measure”
Text and binary encodings A key part of the standard feature encoding is text and binary encodings. The well-known text (WKT) encoding we have shown above gives us a human-readable description of the geometry. The well-known binary (WKB) encoding is machine-readable, lossless, and faster to work with than text encoding. WKB is used for all interactions with GDAL and GEOS.
Operations on geometries We can break down operations on geometries for vector features in the following way:
- predicates: a logical asserting a certain property is
TRUE - measures: a quantity (a numeric value, possibly with measurement unit)
- transformations: newly generated geometries
We can look at these operations by what they operate on, whether the are single geometries, pairs, or sets of geometries:
- unary when it’s a single geometry
- binary when it’s pairs of geometries
- n-ary when it’s sets of geometries
Unary predicates work to describe a property of a geometry.
A list of unary predicates:
| predicate | meaning |
|---|---|
is |
Tests if geometry belongs to a particular class |
is_simple |
Tests whether geometry is simple |
is_valid |
Test whether geometry is valid |
is_empty |
Tests if geometry is empty |
A list of binary predicates is:
| predicate | meaning | inverse of |
|---|---|---|
contains |
None of the points of A are outside B | within |
contains_properly |
A contains B and B has no points in common with the boundary of A | |
covers |
No points of B lie in the exterior of A | covered_by |
covered_by |
Inverse of covers |
|
crosses |
A and B have some but not all interior points in common | |
disjoint |
A and B have no points in common | intersects |
equals |
A and B are topologically equal: node order or number of nodes may differ; identical to A contains B and A within B | |
equals_exact |
A and B are geometrically equal, and have identical node order | |
intersects |
A and B are not disjoint | disjoint |
is_within_distance |
A is closer to B than a given distance | |
within |
None of the points of B are outside A | contains |
touches |
A and B have at least one boundary point in common, but no interior points | |
overlaps |
A and B have some points in common; the dimension of these is identical to that of A and B | |
relate |
Given a mask pattern, return whether A and B adhere to this pattern |
See the Geometries chapter of Spatial Data Science for a full treatment that also covers **unary and binary measures* as well as unary, binary and n-ary transformers
Raster data model
You can read the GDAL raster data model and OpenGIS Grid Coverages specification but with a raster data model (a cell-based tesselation) you can read / write and operate on raster data in numerous formats using gdal:
Drainage basins and catchments - mainstems and flowpaths
When writing software for hydrologic data, concrete conceptual and logical definitions for the spatial features our software works with are critical to enabling interoperability. Along these lines, some key definitions to consider:
- Catchment: A physiographic unit with zero or one inlets and one outlet. A catchment is represented by one or more partial realizations; flowpath, divide, and networks of flowpaths and divides.
- Nexus: Conceptual outlet for water contained by a catchment. The hydro nexus concept represents the place where a catchment interacts with another catchment.
- Flowpath: A flowpath is a linear geometry that represents the connection between a catchment’s inlet and its outlet. All flowpaths have a local drainage area and may be aggregates of flowlines.
- Flowline: A flowline is a linear geometry that represents a segment of a flowing body of water. Some flowlines have no local drainage area and are never aggregate features.
- Drainage Basin: A catchment with zero inlets and one (internal or external) outlet.
- Mainstem: The flowpath of a drainage basin.
In the following, we create a representation of catchment divides and flowpaths with the hydrologic locations that represent their nexuses.
Note: both the divide and flowpath are said to be realizations of the overall catchment that they represent. In other words, the divide and flowpath are both part of the representation of a catchment.
Code
source(system.file("extdata/new_hope_data.R", package = "nhdplusTools"))
catchment <- sf::st_geometry(new_hope_catchment)
flowpath <- sf::st_geometry(dplyr::filter(new_hope_flowline, COMID %in% new_hope_catchment$FEATUREID))
nexus <- sf::st_geometry(nhdplusTools::get_node(flowpath, "end"))
plot_window <- sf::st_as_sfc(sf::st_bbox(dplyr::filter(new_hope_catchment, FEATUREID %in% c(8891178, 8895564))))
par(mar = c(0,0,0,0))
plot(plot_window, col = NA, border = NA)
plot(catchment, col = NA, border = "grey25", add = TRUE)
plot(flowpath, col = "dodgerblue3", add = TRUE)
plot(nexus, col = "springgreen", add = TRUE, pch = 20)In this example, we create a drainage basin boundary (divide), mainstem flowpath of the drainage basin, and the flowlines that make up the hydrographic network of the drainage basin.
Fun Fact: the word “watershed” has come to refer to the land encompassed by a drainage divide but this usage is not correct. A watershed is “a dividing ridge between drainage areas” which is in line with the non hydrologic use of the word: a crucial dividing point, line, or factor.
Code
basin <- sf::st_geometry(sf::st_union(new_hope_catchment, by_feature = FALSE))
flowline <- sf::st_geometry(new_hope_flowline)
mainstem <- sf::st_geometry(new_hope_flowline[new_hope_flowline$LevelPathI == min(new_hope_flowline$LevelPathI),])
par(mar = c(0,0,0,0))
plot(basin, lwd = 2)
plot(flowline, col = "dodgerblue", add = TRUE)
plot(mainstem, col = "blue", lwd = 2, add = TRUE)Demonstration of Key Concepts in Each Language
- drainage basins and catchments - mainstems and flowpaths
- geospatial data representation fundamentals, simple features and geospatial grids
Examples in R
sf: simple features
In R, the sf package provides “support for simple features, a standardized way to encode spatial vector data…. [and] Binds to ‘GDAL’ for reading and writing data, to ‘GEOS’ for geometrical operations, and to ‘PROJ’ for projection conversions and datum transformations.”
When using R, you’ are’re using an interface to the core community standards, software, and practices (this isn’t exclusive to R). TO highlight this we can install (do this once) and attach sf to view the external dependencies versions of the libraries linked to sf.
Code
# install.packages("sf")
library(sf)
sf_extSoftVersion() GEOS GDAL proj.4 GDAL_with_GEOS USE_PROJ_H
"3.11.1" "3.6.2" "9.1.1" "true" "true"
PROJ
"9.1.1"
The bindings to these lower-level C libraries, and, the larger sf ecosystem in R can be seen below:
In the sf implementation in the R ecosystem stores simple feature geometries (sfg) as part of a larger data.frame using a simple feature geometry list-column (sfg). The collection of attribute and spatial information define a simple feature that can be operated on in both table (SQL) and spatial (GEOS, etc) contexts. Not only does this allow us to make the most use of the growing spatial community but also of the growing data science community (see ggplot, dplyr, data.table, dbplyr, arrow, etc.)
In practice, an sf object in R looks like the following:
We can break this down following examples presented in the recently published Spatial Data Science by Edzar Pebesma and Rober Bivand. The most common simple feature geometries used to represent a single feature are:
| type | description |
|---|---|
POINT |
single point geometry |
MULTIPOINT |
set of points |
LINESTRING |
single linestring (two or more points connected by straight lines) |
MULTILINESTRING |
set of linestrings |
POLYGON |
exterior ring with zero or more inner rings, denoting holes |
MULTIPOLYGON |
set of polygons |
GEOMETRYCOLLECTION |
set of the geometries above |
Code
library(sf) |> suppressPackageStartupMessages()
par(mfrow = c(2,4))
par(mar = c(1,1,1.2,1))
# 1
p <- st_point(0:1)
plot(p, pch = 16)
title("point")
box(col = 'grey')
# 2
mp <- st_multipoint(rbind(c(1,1), c(2, 2), c(4, 1), c(2, 3), c(1,4)))
plot(mp, pch = 16)
title("multipoint")
box(col = 'grey')
# 3
ls <- st_linestring(rbind(c(1,1), c(5,5), c(5, 6), c(4, 6), c(3, 4), c(2, 3)))
plot(ls, lwd = 2)
title("linestring")
box(col = 'grey')
# 4
mls <- st_multilinestring(list(
rbind(c(1,1), c(5,5), c(5, 6), c(4, 6), c(3, 4), c(2, 3)),
rbind(c(3,0), c(4,1), c(2,1))))
plot(mls, lwd = 2)
title("multilinestring")
box(col = 'grey')
# 5 polygon
po <- st_polygon(list(rbind(c(2,1), c(3,1), c(5,2), c(6,3), c(5,3), c(4,4), c(3,4), c(1,3), c(2,1)),
rbind(c(2,2), c(3,3), c(4,3), c(4,2), c(2,2))))
plot(po, border = 'black', col = '#ff8888', lwd = 2)
title("polygon")
box(col = 'grey')
# 6 multipolygon
mpo <- st_multipolygon(list(
list(rbind(c(2,1), c(3,1), c(5,2), c(6,3), c(5,3), c(4,4), c(3,4), c(1,3), c(2,1)),
rbind(c(2,2), c(3,3), c(4,3), c(4,2), c(2,2))),
list(rbind(c(3,7), c(4,7), c(5,8), c(3,9), c(2,8), c(3,7)))))
plot(mpo, border = 'black', col = '#ff8888', lwd = 2)
title("multipolygon")
box(col = 'grey')
# 7 geometrycollection
gc <- st_geometrycollection(list(po, ls + c(0,5), st_point(c(2,5)), st_point(c(5,4))))
plot(gc, border = 'black', col = '#ff6666', pch = 16, lwd = 2)
title("geometrycollection")
box(col = 'grey')Code
pPOINT (0 1)
Code
mpMULTIPOINT ((1 1), (2 2), (4 1), (2 3), (1 4))
Code
lsLINESTRING (1 1, 5 5, 5 6, 4 6, 3 4, 2 3)
Code
mlsMULTILINESTRING ((1 1, 5 5, 5 6, 4 6, 3 4, 2 3), (3 0, 4 1, 2 1))
Code
poPOLYGON ((2 1, 3 1, 5 2, 6 3, 5 3, 4 4, 3 4, 1 3, 2 1), (2 2, 3 3, 4 3, 4 2, 2 2))
Code
mpoMULTIPOLYGON (((2 1, 3 1, 5 2, 6 3, 5 3, 4 4, 3 4, 1 3, 2 1), (2 2, 3 3, 4 3, 4 2, 2 2)), ((3 7, 4 7, 5 8, 3 9, 2 8, 3 7)))
Code
gcGEOMETRYCOLLECTION (POLYGON ((2 1, 3 1, 5 2, 6 3, 5 3, 4 4, 3 4, 1 3, 2 1), (2 2, 3 3, 4 3, 4 2, 2 2)), LINESTRING (1 6, 5 10, 5 11, 4 11, 3 9, 2 8), POINT (2 5), POINT (5 4))
This extends the idea of “tidy” data in that each row represents one observation, which has one geometric representation of the real world feature it describes.
An example of basic use:
Code
# Define file path
filename <- system.file("shape/nc.shp", package="sf")
# read in file
(nc <- read_sf(filename))Simple feature collection with 100 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
Geodetic CRS: NAD27
# A tibble: 100 × 15
AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74 SID74 NWBIR74
<dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <int> <dbl> <dbl> <dbl>
1 0.114 1.44 1825 1825 Ashe 37009 37009 5 1091 1 10
2 0.061 1.23 1827 1827 Alle… 37005 37005 3 487 0 10
3 0.143 1.63 1828 1828 Surry 37171 37171 86 3188 5 208
4 0.07 2.97 1831 1831 Curr… 37053 37053 27 508 1 123
5 0.153 2.21 1832 1832 Nort… 37131 37131 66 1421 9 1066
6 0.097 1.67 1833 1833 Hert… 37091 37091 46 1452 7 954
7 0.062 1.55 1834 1834 Camd… 37029 37029 15 286 0 115
8 0.091 1.28 1835 1835 Gates 37073 37073 37 420 0 254
9 0.118 1.42 1836 1836 Warr… 37185 37185 93 968 4 748
10 0.124 1.43 1837 1837 Stok… 37169 37169 85 1612 1 160
# ℹ 90 more rows
# ℹ 4 more variables: BIR79 <dbl>, SID79 <dbl>, NWBIR79 <dbl>,
# geometry <MULTIPOLYGON [°]>
Code
# Map
plot(nc['SID79'])Code
# Spatial measures!
head(st_area(nc))Units: [m^2]
[1] 1137107793 610916077 1423145355 694378925 1520366979 967504822
Code
# Spatial operations!
{
st_union(nc) |> plot()
st_centroid(nc)$geometry |> plot(col = "red", add = TRUE)
}Warning: st_centroid assumes attributes are constant over geometries
Code
# data science operations
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:terra':
intersect, union
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Code
{
plot(nc$geometry)
plot(slice_max(nc, AREA, n = 10)$geometry,
col = "red", add = TRUE)
plot(slice_max(nc, AREA, n =5)$geometry,
col = "yellow", add = TRUE)
plot(slice_max(nc, AREA, n = 1)$geometry,
col = "green", add = TRUE)
}Measure (GEOS Measures)
Measures are the questions we ask about the dimension of a geometry, once a coordinate reference system has been established and we can compute: - How long is a line or polygon perimeter? (unit) - What is the area of a polygon? *(unit^2) - How far are two objects from one another? (unit)
- Measures come from the GEOS library
- Measures are calculated in the units of the projection
We’ll demonstrate with a toy dataset of USGS gages in a data package from a previous workshop:
Code
# remotes::install_github("mhweber/awra2020spatial")
gages <- read.csv(system.file("extdata", "Gages_flowdata.csv", package = "awra2020spatial")) |>
dplyr::select(SOURCE_FEA, STATE, LAT_SITE, LON_SITE)Rows: 2,771
Columns: 4
$ SOURCE_FEA <int> 14361500, 14344500, 10378500, 14341500, 14343000, 13092500,…
$ STATE <chr> "OR", "OR", "OR", "OR", "OR", "ID", "OR", "OR", "OR", "OR",…
$ LAT_SITE <dbl> 42.43040, 42.42763, 42.42488, 42.40819, 42.40263, 42.39648,…
$ LON_SITE <dbl> -123.3178, -122.6011, -119.9233, -122.6011, -122.5373, -114…
Using st_as_sf in sf we can easily make a data frame with coordinates into a simple features data frame
Code
gages <- gages |>
sf::st_as_sf(coords = c("LON_SITE", "LAT_SITE"), crs = 4269)Rows: 2,771
Columns: 3
$ SOURCE_FEA <int> 14361500, 14344500, 10378500, 14341500, 14343000, 13092500,…
$ STATE <chr> "OR", "OR", "OR", "OR", "OR", "ID", "OR", "OR", "OR", "OR",…
$ geometry <POINT [°]> POINT (-123.3178 42.4304), POINT (-122.6011 42.42763)…
What is the distance from the first stream gage to the second stream gage?
Code
sf::st_distance(gages[1,], gages[2,])Units: [m]
[,1]
[1,] 58822.95
We can generate toy area by buffering a point (to make a polygon) and find area and perimeter. We need to project into a planar CRS to get perimeter so we will project to an Albers projection using an epsg code:
Code
poly <- sf::st_buffer(gages[1,],200)
sf::st_area(poly)127234.3 [m^2]
Code
poly <- sf::st_transform(poly, 5070)
lwgeom::st_perimeter(poly)1713.517 [m]
terra for raster data
We’ll look at using terra for working with raster data in R - stars is another library for working with raster data and spatiotemporal arrays (raster and vector data cubes) but for the sake of time we won’t demonstrate stars in this workshop.
terra builds on the older raster package and provides methods for low-level data manipulation as well as high-level global, local, zonal, and focal computations. The predict and interpolate methods facilitate the use of regression type (interpolation, machine learning) models for spatial prediction, including with satellite remote sensing data. Processing of very large files is supported.
Like sf, terra links to GDAL, and the version and drivers can be viewed using package support functions:
Code
# install.packages(terra)
library(terra)
gdal()[1] "3.6.2"
Code
DT::datatable(gdal(drivers = TRUE))We can get raster data the National Map family of APIs and in particular 3DEP Elevation data and work with it using terra or stars. We’ll use Mike Johnson’s AOI package to get a county polygon to use as a template to pass to terrainr for our area of interest
Code
library(terrainr)
# remotes::install_github("mikejohnson51/AOI")
library(AOI)The legacy packages maptools, rgdal, and rgeos, underpinning this package
will retire shortly. Please refer to R-spatial evolution reports on
https://r-spatial.org/r/2023/05/15/evolution4.html for details.
This package is now running under evolution status 0
Code
library(mapview)
mapviewOptions(fgb=FALSE)
AOI::aoi_get(list("Corvallis, OR", 10, 10)) |> mapview()Code
# aoi <- AOI::aoi_get(state = "OR", county = "Benton")
aoi <- aoi_get(list("Corvallis, OR", 10, 10))
output_tiles <- get_tiles(aoi,
services = c("elevation", "ortho"),
resolution = 30 # pixel side length in meters
)Code
terra::plot(terra::rast(output_tiles[["elevation"]][[1]]))Code
terra::plotRGB(terra::rast(output_tiles[["ortho"]][[1]]),scale = 1)We can also use the [elevatr] package by Jeff Hollister for accessing elevation data through web services and clip out elevation to a watershed basin we access through nhdplusTools and the NLDI.
Note that get_elev_raster from elevatr will generate a set of warnings, particularly regarding retirement of rgdal - the package will need to be updated to conform with retirement of rgdal, rgeos and maptools.
Code
library(nhdplusTools)
library(elevatr)
library(mapview)
mapviewOptions(fgb=FALSE)
# We're passing an identifier for the stream reach (and catchment) that we know is the downstream-most segment on the Calapooia River using the COMID below
start_comid = 23763529
nldi_feature <- list(featureSource = "comid", featureID = start_comid)
basin <- nhdplusTools::get_nldi_basin(nldi_feature = nldi_feature)
x <- get_elev_raster(basin, z = 12)
# x is returned as a raster object rather than a terra spatraster, so we need to convert to spatraster
mapview::mapview(basin) + mapview(x)Warning in rasterCheckSize(x, maxpixels = maxpixels): maximum number of pixels for Raster* viewing is 5e+05 ;
the supplied Raster* has 19843980
... decreasing Raster* resolution to 5e+05 pixels
to view full resolution set 'maxpixels = 19843980 '
We can also crop the elevation raster to our basin using terra - note that mapview expects raster objects rather than terra SpatRasters so we need to convert between terr and raster.
Code
library(raster)Loading required package: sp
Attaching package: 'raster'
The following object is masked from 'package:dplyr':
select
Code
x <- terra::mask(terra::rast(x), basin)
x <- raster::raster(x)
mapview::mapview(basin, alpha.regions = 0.02, color='blue', lwd=2) + mapview(x)Warning in rasterCheckSize(x, maxpixels = maxpixels): maximum number of pixels for Raster* viewing is 5e+05 ;
the supplied Raster* has 19843980
... decreasing Raster* resolution to 5e+05 pixels
to view full resolution set 'maxpixels = 19843980 '
httr and jsonlite
In R, httr and jsonlite packages are used to consume the API’s that provide data in json format.
httr httr provides us with an HTTP client to access the API with GET/POST methods, passing query parameters, and verifying the response with regard to the data format.
jsonlite This R package is used to convert the received json format to readable R object or data frame. jsonlite can also be used to convert R objects or a data frame to a json format data type.
Along with these two packages, rlist in R can be used to perform additional manipulation on the received json response. list.stack and list.select are two important methods exposed by rlist that can be used to get parsed json data into a tibble.
To learn about these (and other) packages, type ?httr, ?jsonlite and ?rlist in the console of Rstudio to view the documentation.
Here we see an example using a JSON API from the world bank that uses a GET request below - we can see a couple ways of passing request parameters - first we’ll embed the query directly in the URL for the request itself:
Code
library(httr)
library(jsonlite)
library(dplyr)
library(data.table)
jsonResponse <- httr::GET("http://api.worldbank.org/country?per_page=10®ion=OED&lendingtype=LNX&format=json")We can also generate the query by passing it in a separate list we create and adding as a query parameter in the GET request
Code
query<-list(per_page="10",region="OED",lendingtype="LNX",format="json")
jsonResponse <- httr::GET("http://api.worldbank.org/country",query=query)The jsonlite package can be used to parse our response which we indicated above to return as json. Why does the attempt below return an error?
Code
df <- jsonlite::fromJSON(jsonResponse)HINT: what type of object is jsonResponse above?
Code
typeof(jsonResponse)[1] "list"
We can also get information about the response using httr:
Code
http_type(jsonResponse)[1] "application/json"
We have a list - let’s take a look to figure out to pull out what we want into a data frame:
Code
names(jsonResponse) [1] "url" "status_code" "headers" "all_headers" "cookies"
[6] "content" "date" "times" "request" "handle"
It seems like content is what we want, how do we pull out?
Code
df <- jsonlite::fromJSON(jsonResponse$content)Still doesn’t work - we need to do some further parsing of this response using content in httr which we can then pass to jsonlite and the fromJSON function to get data into a data frame
Code
jsonResponseParsed <- httr::content(jsonResponse, as="parsed")
is.list(jsonResponseParsed)[1] TRUE
Code
names(jsonResponseParsed[[1]])[1] "page" "pages" "per_page" "total"
Code
names(jsonResponseParsed[[2]])NULL
Code
#Hmmm
is.list(jsonResponseParsed[[2]][[1]]) [1] TRUE
Code
names(jsonResponseParsed[[2]][[1]]) [1] "id" "iso2Code" "name" "region" "adminregion"
[6] "incomeLevel" "lendingType" "capitalCity" "longitude" "latitude"
Code
# now we're getting somewhere...
names(jsonResponseParsed[[2]][[2]]) [1] "id" "iso2Code" "name" "region" "adminregion"
[6] "incomeLevel" "lendingType" "capitalCity" "longitude" "latitude"
We convert the parsed json response list to data table using lapply and rbindlist:
Code
df <- lapply(jsonResponseParsed[[2]],as.data.table)
typeof(df) # still a list - one more step[1] "list"
Code
dt <- rbindlist(df, fill = TRUE)Some APIs like GitHub are easier to pull directly into a dataframe using fromJSON
Code
myGitHubRepos <- fromJSON("https://api.github.com/users/mhweber/repos")
# It returns 79 variables about my repos - we can use select to just get the ones we want to see
myGitHubRepos <- myGitHubRepos |>
dplyr::select(name, stargazers_count, watchers_count, language, has_issues, forks_count)
head(myGitHubRepos) name stargazers_count watchers_count language
1 AccumulationScripts 0 0 Python
2 ArrayTools 0 0 Python
3 awra2020spatial 3 3 R
4 AWRA2022GeoWorkshop 6 6 Jupyter Notebook
5 AWRA_2020_R_Spatial 4 4 HTML
6 AWRA_GIS_R_Workshop 2 2 JavaScript
has_issues forks_count
1 TRUE 0
2 FALSE 0
3 TRUE 0
4 TRUE 5
5 TRUE 2
6 TRUE 3
Examples in Python (will be in a separate notebook in binder)
Use Cases
Hydro Addressing
Finding where along a river system some spatial data lies is a key use case for many modeling and analysis tasks. A full discussion is available in the nhdplusTools indexing and referencing vignette.
We need two inputs. Lines to index to and points that we want addresses for. With these inputs, nhdplusTools (and soon hydroloom) supports generation of hydro addresses with get_flowline_index().
The example below is as simple as possible. get_flowline_index() has a number of other capabilities, such as increased address precision and the ability to return multiple nearby addresses, that can be found in the function documentation.
Code
source(system.file("extdata/new_hope_data.R", package = "nhdplusTools"))
fline <- sf::st_transform(sf::st_cast(new_hope_flowline, "LINESTRING"), 4326)
point <- sf::st_sfc(sf::st_point(c(-78.97, 35.916)),
crs = 4326)
(address <- nhdplusTools::get_flowline_index(fline, point)) id COMID REACHCODE REACH_meas offset
1 1 8894158 03030002000064 58.0596 0.0002283261
Code
plot(sf::st_geometry(fline[fline$COMID %in% address$COMID,]))+
plot(point, add = TRUE)integer(0)
A natural next step once we’ve found a hydro address is to search upstream or downstream from the location we found. nhdplusTools and soon hydroloom offer a few upstream / downstream search functions. Here we’ll show a couple from nhdplusTools.
Code
up <- nhdplusTools::get_UT(fline, address$COMID)
um <- nhdplusTools::get_UM(fline, address$COMID)
dn <- nhdplusTools::get_DD(fline, address$COMID)
dm <- nhdplusTools::get_DM(fline, address$COMID)
plot(sf::st_geometry(fline), col = "grey", lwd = 0.5)
plot(sf::st_geometry(fline[fline$COMID %in% c(up, dn),]),
add = TRUE)
plot(sf::st_geometry(fline[fline$COMID %in% c(um, dm),]),
col = "blue", lwd = 2, add = TRUE)
plot(point, cex = 2, lwd = 2, add = TRUE)The ability to navigate up or down a mainstem rather than all connected tributaries and diversions, requires some attributes that identify primary up and downstream paths. Without those paths, navigation is still possible but the “main” path navigation method is not. Below, a new function available in hydroloom, navigate_network_dfs() is shown.
Code
# remotes::install_github("DOI-USGS/nhdplusTools@2cb81da")
#| warning: false
net <- hydroloom::add_toids(sf::st_drop_geometry(fline),
return_dendritic = FALSE)
up <- hydroloom::navigate_network_dfs(net, address$COMID, direction = "up")
dn <- hydroloom::navigate_network_dfs(net, address$COMID, direction = "dn")
plot(sf::st_geometry(fline), col = "grey", lwd = 0.5)
plot(sf::st_geometry(fline[fline$COMID %in% c(unlist(up), unlist(dn)),]),
add = TRUE)
plot(point, cex = 2, lwd = 2, add = TRUE)Catchment Characteristics and Accumulation
nhdplusTools (only in the latest hydroloom branch) and StreamCatTools can be used to retrieve a wide range of catchment characteristics. See the StremCatTools Vignette for further details and examples.
Code
library(nhdplusTools)
# remotes::install_github("USEPA/StreamCatTools")
library(StreamCatTools)
nhdplusTools::nhdplusTools_data_dir(tempdir())
cat <- sf::st_transform(new_hope_catchment, 4326)
streamcat_chars <- StreamCatTools::sc_get_params(param='name')
nawqa_chars <- nhdplusTools::get_characteristics_metadata()
StreamCatTools::sc_fullname("pctimp2016")[1] "Mean Imperviousness 2016"
Code
nawqa_chars$description[nawqa_chars$ID == "CAT_TWI"][1] "Topographic wetness index, ln(a/S); where ln is the natural log, a is the upslope area per unit contour length and S is the slope at that point. See http://ks.water.usgs.gov/Kansas/pubs/reports/wrir.99-4242.html and Wolock and McCabe, 1995 for more detail"
Code
twi <- nhdplusTools::get_catchment_characteristics("CAT_TWI", cat$FEATUREID)
twi <- dplyr::rename(twi, CAT_TWI = "characteristic_value")
imp <- StreamCatTools::sc_get_data("pctimp2016", 'catchment', cat$FEATUREID)
cat <- dplyr::left_join(cat, imp, by = c("FEATUREID" = "COMID"))
# dplyr::left_join(twi, by = c("FEATUREID" = "comid"))
# plot(cat['CAT_TWI'])
plot(cat['PCTIMP2016CAT'])With the two catalogs of characteristics above, we have access to nearly any characteristic imaginable and both even offer downstream accumulations of the variables. However, sometimes it is necessary to sample and accumulate data not already available. nhdplusTools supports accumulating drainage area and length and hydroloom, going forward, supports downstream accumulation more generically. Here, we’ll just use one of the examples pulled above to illustrate how this works.
Code
library(dplyr)
accum_df <- sf::st_drop_geometry(fline) |>
dplyr::select(COMID, FromNode, ToNode, Divergence, AreaSqKM) |>
hydroloom::add_toids() |>
dplyr::left_join(dplyr::select(cat, -AreaSqKM), by = c("COMID" = "FEATUREID")) |>
sf::st_sf() |>
dplyr::mutate(area_weighted_PCTIMP2016CAT =
ifelse(is.na(PCTIMP2016CAT) & AreaSqKM == 0,
yes = 0, no = AreaSqKM * PCTIMP2016CAT))
accum_df <- accum_df |>
dplyr::mutate(tot_PCTIMP2016CAT =
hydroloom::accumulate_downstream(accum_df, "area_weighted_PCTIMP2016CAT") /
hydroloom::accumulate_downstream(accum_df, "AreaSqKM"))
plot(cat['PCTIMP2016CAT'])Code
plot(accum_df['tot_PCTIMP2016CAT'])Data access summary – web services and scalability
R and Python Interoperability
We can usReticulate to inter-operate easily between R and Python within RStudio and share objects between languages
Code
library(reticulate)
# reticulate::install_miniconda()
# create a new environment conda environment
# conda_create("r-reticulate")
# install NumPy and SciPy
# conda_install("r-reticulate", "scipy")
# conda_install("r-reticulate", "numpy")
# conda_install("r-reticulate", "pandas")
# indicate that we want to use a specific condaenv
use_condaenv("r-reticulate")
np <- import("numpy")
print(np$version$full_version)[1] "1.24.3"
Simple examples from reticulate - calling Python:
Import a module and call a function from Python:
Code
os <- import("os")
os$listdir(".") [1] ".git" ".gitignore"
[3] ".ipynb_checkpoints" ".Rhistory"
[5] ".Rproj.user" "3dep.ipynb"
[7] "cache" "classification.ipynb"
[9] "Dam_Impact.html" "Dam_Impact.ipynb"
[11] "Dam_Impact.Rmd" "environment.yml"
[13] "ICRW8_Geospatial_Workshop.Rproj" "img"
[15] "README.md" "Relative_Elevation_Model.ipynb"
[17] "Slides.html" "Slides.ipynb"
[19] "Slides.qmd" "Slides.rmarkdown"
[21] "Slides_files"
Object conversion - When Python objects are returned to R they are converted to R objects by default - but you can deal in native Python types by choosing convert=FALSE in the import function
Code
# import numpy and specify no automatic Python to R conversion
np <- import("numpy", convert = FALSE)
# do some array manipulations with NumPy
a <- np$array(c(1:4))
print (a)array([1, 2, 3, 4])
Code
sum <- a$cumsum()
# convert to R explicitly at the end
print (py_to_r(a))[1] 1 2 3 4
Code
py_to_r(sum)[1] 1 3 6 10
Passing Dataframes between Python and R:
Code
pd <- import("pandas")
penguins <- pd$read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/modeldata/penguins.csv")
dplyr::glimpse(penguins)Rows: 344
Columns: 8
$ `Unnamed: 0` <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1…
$ species <chr> "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "A…
$ island <chr> "Torgersen", "Torgersen", "Torgersen", "Torgersen", …
$ bill_length_mm <dbl> 39.1, 39.5, 40.3, NaN, 36.7, 39.3, 38.9, 39.2, 34.1,…
$ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NaN, 19.3, 20.6, 17.8, 19.6, 18.1,…
$ flipper_length_mm <dbl> 181, 186, 195, NaN, 193, 190, 181, 195, 193, 190, 18…
$ body_mass_g <dbl> 3750, 3800, 3250, NaN, 3450, 3650, 3625, 4675, 3475,…
$ sex <list> "male", "female", "female", NaN, "female", "male", …
Or like this:
Code
import pandas as pd
penguins =pd.read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/modeldata/penguins.csv")
print(penguins.head()) Unnamed: 0 species island ... flipper_length_mm body_mass_g sex
0 1 Adelie Torgersen ... 181.0 3750.0 male
1 2 Adelie Torgersen ... 186.0 3800.0 female
2 3 Adelie Torgersen ... 195.0 3250.0 female
3 4 Adelie Torgersen ... NaN NaN NaN
4 5 Adelie Torgersen ... 193.0 3450.0 female
[5 rows x 8 columns]
Call Python from R:
Code
penguins <- py$penguins
dplyr::glimpse(penguins)Rows: 344
Columns: 8
$ `Unnamed: 0` <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1…
$ species <chr> "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "A…
$ island <chr> "Torgersen", "Torgersen", "Torgersen", "Torgersen", …
$ bill_length_mm <dbl> 39.1, 39.5, 40.3, NaN, 36.7, 39.3, 38.9, 39.2, 34.1,…
$ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NaN, 19.3, 20.6, 17.8, 19.6, 18.1,…
$ flipper_length_mm <dbl> 181, 186, 195, NaN, 193, 190, 181, 195, 193, 190, 18…
$ body_mass_g <dbl> 3750, 3800, 3250, NaN, 3450, 3650, 3625, 4675, 3475,…
$ sex <list> "male", "female", "female", NaN, "female", "male", …
Call R from Python:
Code
data(iris)
iris <- iris |>
dplyr::filter(Petal.Length > 3)Code
print(r.iris.head()) Sepal.Length Sepal.Width Petal.Length Petal.Width Species
0 7.0 3.2 4.7 1.4 versicolor
1 6.4 3.2 4.5 1.5 versicolor
2 6.9 3.1 4.9 1.5 versicolor
3 5.5 2.3 4.0 1.3 versicolor
4 6.5 2.8 4.6 1.5 versicolor
Code
print(r.iris['Petal.Length'].min())3.3
Resources
General
R
- Hydroinformatics in R: Extensive Notes and exercises for a course on data analysis techniques in hydrology using the programming language R
- Spatial Data Science by Edzar Pebesma and Roger Bivand
- Geocomputation with R
- r-spatial: Suite of fundamental packages for working with spatial data in R
- Working with Geospatial Hydrologic Data Using Web Services (R)
- Accessing REST API (JSON data) using httr and jsonlite
Python
- Datashader: Accurately render even the largest data
- GeoPandas
- HyRiver: a suite of Python packages that provides a unified API for retrieving geospatial/temporal data from various web services
- Python Foundation for Spatial Analysis
- Python for Geographic Data Analysis
- gdptools A Python package for grid- or polygon-polygon area-weighted interpolation statistics
- Intro to Python GIS
- xarray: An open-source project and Python package that makes working with labeled multi-dimensional arrays simple, efficient, and fun!
- rioxarray: Rasterio xarray extension.
- GeoPandas: An open-source project to make working with geospatial data in python easier.
- OSMnx: A Python package that lets you download and analyze geospatial data from OpenStreetMap.
- Xarray Spatial: Implements common raster analysis functions using
numbaand provides an easy-to-install, easy-to-extend codebase for raster analysis.